Supplementary Final Report

Author: David West

This supplementary final report explores the synaptic density data further. The primary final report was prepared on May 5, 2016, and examines data taken from 2011 M. musculus V1 dataset from Network anatomy and in vivo physiology of visual cortical neurons (Bock et al). Amongst a few key insights, a primary finding of the initial analyses was that the y-axis of the 3D volume was aligned with the z-axis of the brain, or the axis along which cortical layers vary. The report found key inflection points in synaptic density down the volume's y-axis. The report also offered a cursory analysis of clustering and trends in synaptic distribution down the y-axis, as well as through the x and z axes.

Abstract

As tissue varies down the cortical layers, multiple properties become evident. Namely,

Here, we'll perform various analysis by constructing graphs and measure properties of those graphs to learn more about the data


In [3]:
import csv
from scipy.stats import kurtosis
from scipy.stats import skew
from scipy.spatial import Delaunay
import numpy as np
import math
import skimage
import matplotlib.pyplot as plt
import seaborn as sns
from skimage import future
import networkx as nx
from ragGen import *
%matplotlib inline
sns.set_color_codes("pastel")
from scipy.signal import argrelextrema

In [4]:
# Read in the data
data = open('../../data/data.csv', 'r').readlines()
fieldnames = ['x', 'y', 'z', 'unmasked', 'synapses']
reader = csv.reader(data)
reader.next()

rows = [[int(col) for col in row] for row in reader]

# These will come in handy later
sorted_x = sorted(list(set([r[0] for r in rows])))
sorted_y = sorted(list(set([r[1] for r in rows])))
sorted_z = sorted(list(set([r[2] for r in rows])))

We'll start with just looking at analysis in euclidian space, then thinking about weighing by synaptic density later. Since we hypothesize that our data will show that tissue varies as we move down the y-axis (z-axis in brain) through cortical layers, an interesting thing to do would be compare properties of the graphs on each layer (ie how does graph connectivity vary as we move through layers).

Let's start by triangulating our data. We'll use Delaunay on each y layer first. Putting our data in the right format for doing graph analysis...

Now that our data is in the right format, we'll create 52 delaunay graphs. Then we'll perform analyses on these graphs. A simple but useful metric would be to analyze edge length distributions in each layer.

There is no distance between the two. Therefore it is perhaps more useful to consider a graph that considers node weights. Voronoi is dual to Delaunay, so that's not much of an option. We want something that considers both spacial location and density similarity.

Description of Region Adjacency Graphs (RAG)

In the past we've considered density information alone (ie analysis density histogram distribution) and above we are considering only spacial information, which totally doesn't say anything. To construct a graph that consider both spacial and density information, we'll use a Region Adjacency Graph (RAG).

In RAGs, two nodes are considered as neighbor if they are close in proximity (separated by a small number of pixels/voxels) in the horizontal or vertical direction.

Since our data includes density values at each node (voxel, or pixel since we're looking at y-layers), we can weight by the inverse of density difference between two nodes. Inverse because strongly connected nodes should be close in weight.

We have number of synapses Si at nodes $i$ and define weights $w$ between the nodes:

$$w = \dfrac{1}{S_i - S_{i+1}}$$

RAGs are used largely in image processing, and it makes sense for our data to look more like an image. Since the data is evenly spaced, the absolute locations of the voxels don't matter. We can use the index in the matrix to represent spacial location, with the value at each pixel being the synapse density at that voxel. We've done this before in "real volume"


In [5]:
real_volume = np.zeros((len(sorted_y), len(sorted_x), len(sorted_z)))
for r in rows:
    real_volume[sorted_y.index(r[1]), sorted_x.index(r[0]), sorted_z.index(r[2])] = r[-1]
    
vol = np.zeros((len(sorted_x), len(sorted_y), len(sorted_z)))
for r in rows:
    vol[sorted_x.index(r[0]), sorted_y.index(r[1]), sorted_z.index(r[2])] = r[-1]

Creating Non-linear RAGs for each layer


In [11]:
y_rags_nonlinear = []
for layer in real_volume:
    y_rags_nonlinear.append(generate_rag(layer, False))

OK, great! Now we have a list of 52 region adjacency graphs for each y-layer. Now lets generate edge weight distributions.


In [12]:
def get_edge_weight_distributions(rags):
    distributions = []
    for rag in rags:    
        itty = rag.edges_iter()
        weight_list = []
        for index in range(rag.number_of_edges()):
            eddy = itty.next()
            weight_list.append(rag.get_edge_data(eddy[0], eddy[1])['weight'])

        distributions.append(weight_list)
    return distributions

In [14]:
distributions_nonlinear = get_edge_weight_distributions(y_rags_nonlinear)
distributions_nonlinear = distributions_nonlinear[:40]

Exploration of Graph Data

Full edge weight histograms down y-axis


In [15]:
count = 0
for distr in distributions_nonlinear:
    plt.hist(distr, bins=150)
    plt.title("Layer " + str(count) + " Edge Weight Histogram")
    plt.show()
    count+=1


Note that we put padding on the data, ignoring the edge layers which appear to be inconsistent due to data generation methods (ie there is no tissue there)

Trends in Graph Connectivity Using Mean Edge Weights

The edge weights are a measure of connectivity. Thus, we want to see how connectivity changes through the y-layers, which we confidently believe represents deeper cortical layers. We'll start by taking the mean edge weight for each layer as a measure of connectivity for that layer.

Note: Connectivity here is not referring to connectivity in the neuroscience sense, but rather in the graph theory sense. Closely "connected" supervoxels are closer in density, but might both be low in absolute synaptic density, and thus less connected in the neuroscience sense. For the purposes of this analysis, I'll generally consider connectivity to be graph connectivity unless otherwise stated."


In [114]:
y_edge_means = []
for distrib in distributions:
    y_edge_means.append(np.mean(distrib))

In [115]:
g = sns.barplot(x=range(len(y_edge_means)), y=y_edge_means, color='r')
g.set_xticks(g.get_xticks()[::2])
plt.xticks(g.get_xticks(), g.get_xticks())
plt.title("Non-Linear Edge Weight Means for Each Y Layer")
plt.xlabel("Y-Axis Layer")
plt.ylabel("Edge Weight Mean")
sns.plt.show()


While it is difficult to see any kind of pattern that might quantitatively separate cortical layers as we saw in the simple mean density plots through the y-axis, we do see a trend in increasing mean edge weights as we get deeper into the y-layers. Specifically, we can see below that connectivity increases about 33% between the first and last layer considered (layer 0 and 39). Remember that we found that the the tissue becomes less dense as we traverse down the y-axis (see here). So at the least, we see that as tissue becomes less dense, the connectivity of the RAG increases.


In [117]:
start_val = y_edge_means[0]
end_val = y_edge_means[len(y_edge_means)-1]
(end_val-start_val)/start_val


Out[117]:
0.32959191057313436

Further Understanding the Mean Edge Weights and RAG in General

These results are intriguing. Why would connectivity (as we define it) increase with cortical depth as the tissue becomes less synaptically-dense? Let's look at this further.

Because edge weight is a function of difference in density for neighboring supervoxels, the increase in average edge weight might be due to the fact that the distribution of synapses throughout the tissue is becoming more sparse, and thus more irregular throughout the tissue. It's worth noting that we have already eliminated the possibility of edge effects. Namely, since deepest 12 y-layers are matrices of nearly all 0's with some noise, one would expect that the edge means would be very high. Indeed, we can see below that when we put those edges back in, there is a large spike in the last 12 layers when padding becomes relevant.


In [118]:
distributions_uncropped = get_edge_weight_distributions(y_rags)

In [112]:
y_edge_means_uncropped = []
for distrib in distributions_uncropped:
    y_edge_means_uncropped.append(np.mean(distrib))

In [113]:
g = sns.barplot(x=range(len(y_edge_means_uncropped)), y=y_edge_means_uncropped, color='r')
g.set_xticks(g.get_xticks()[::2])
plt.xticks(g.get_xticks(), g.get_xticks())
plt.title("Non-Linear Edge Weight Means for Each Y Layer, No Padding")
plt.xlabel("Y-Axis Layer")
plt.ylabel("Edge Weight Mean")
sns.plt.show()


In interpreting this RAG, let's hypothesize that graph connectivity is negatively correlated with density variance. It's reasonable to expect that higher variance in density would translate into an increased probability of having two very different supervoxels next to each other, and thus lower graph connectivity. To check this, let's look at the trend in variance.


In [120]:
y_density_variances = []
for layer in real_volume:
    y_density_variances.append(np.var(layer))
    
g = sns.barplot(x=range(len(y_density_variances[:40])), y=y_density_variances[:40], color='g')
g.set_xticks(g.get_xticks()[::2])
plt.xticks(g.get_xticks(), g.get_xticks())
plt.title("Variance of Density by Y-Layer")
plt.xlabel("Y-Axis Layer")
plt.ylabel("Density Variance")
sns.plt.show()


Indeed, as variance increases, connectivity decreases - it's more likely that there is a higher difference in synaptic density in neighboring super pixels with higher variance. We can see this correlation quantitatively with Pearson's coefficient between the density variance and mean edge weight:


In [124]:
np.corrcoef(y_density_variances[:40], y_edge_means)[0,1]


Out[124]:
-0.85814925572734968

Confirmed: we have a very strong negative correlation between density variance and mean graph connectivity through the y-layers. However, note that it is not a perfect correlation. Mean edge weight tells us something slightly different than simple density variance. This could likely be due to a few properties of the RAG as we calculated it.

  1. The RAG considers differences in synaptic density for supervoxels in close spatial proximity to each other. Thus, in a way, connectivity measures local clustering.
  2. The edge weights are given by a non-linear function of synaptic density distance where $$w = \dfrac{1}{|S_i - S_{i+1}|}$$
  3. The weighting function is not normalized by the range of density differences within each layer.

Thus, to points 2 and 3, it would be interesting to see how a different weighting function that is linear and normalized by layer looks. Perhaps we can see trends in the data that could act as a feature that not only shows how density and clustering changes through the cortical layers, but also discriminates between those layers.

Edge Weight Variances


In [58]:
y_edge_vars = []
for distrib in distributions:
    y_edge_vars.append(np.var(distrib))

In [59]:
g = sns.barplot(x=range(len(y_edge_vars)), y=y_edge_vars, color='r')
g.set_xticks(g.get_xticks()[::2])
plt.xticks(g.get_xticks(), g.get_xticks())
plt.title("Non-Linear Edge Weight Variances for Each Y Layer")
plt.xlabel("Y-Axis Layer")
plt.ylabel("Edge Weight Varience")
sns.plt.show()


Edge Weight Third Moments (Skewness)


In [17]:
y_edge_skews = []
for distrib in distributions_nonlinear:
    y_edge_skews.append(skew(distrib))

In [18]:
g = sns.barplot(x=range(len(y_edge_skews)), y=y_edge_skews, color='r')
g.set_xticks(g.get_xticks()[::2])
plt.xticks(g.get_xticks(), g.get_xticks())
plt.title("Non-Linear Edge Weight Skewness for Each Y Layer")
plt.xlabel("Y-Axis Layer")
plt.ylabel("Edge Weight Skewness")
sns.plt.show()


Edge Weight Fourth Moments (Kurtosis)


In [65]:
y_edge_kurts = []
for distrib in distributions_nonlinear:
    y_edge_kurts.append(kurtosis(distrib)) 

print y_edge_kurts


[10.23804038626901, 9.659871939560425, 10.126489421844912, 9.487842928130155, 8.340949835038519, 9.079843616654733, 8.4577457245625, 8.796762945659259, 7.612789434857364, 8.450711707218769, 7.983654339058523, 8.617154019363616, 7.8199708817582145, 7.488860404537743, 6.836466100862996, 7.061271285090223, 7.228189540329977, 7.9896995574107095, 7.2965224485370275, 7.579757603659418, 7.6402543849600875, 7.329127582707219, 7.73957764837176, 7.541436736475237, 7.811626007448753, 7.5318810342862434, 7.445028685529282, 6.8050071473964575, 6.590949948186164, 7.108153535380023, 6.952258902862194, 5.938001000171235, 6.387577398849082, 5.859616601257473, 6.332259008643769, 5.356473182606829, 5.514073928343686, 5.703841872292106, 6.318236047886609, 5.642698148627998]

In [66]:
sns.barplot(x=range(len(y_edge_kurts)), y=y_edge_kurts)
sns.plt.show()


Hmmm...very interesting

Linear graph weights

We're now going to change our weighting function to be linear and scaled by the max and min difference in each layer. This might help eliminates some of the edge effect behavior I suspect is causing that rapid change in statistics in deeper y-layers.


In [34]:
y_rags_linear_weight = []
for layer in real_volume:
    y_rags_linear_weight.append(generate_rag(layer, True))

In [35]:
test_rag = generate_rag(real_volume[4], True)
itty = test_rag.edges_iter()
weight_list = []
for index in range(test_rag.number_of_edges()):
    eddy = itty.next()
    weight_list.append(test_rag.get_edge_data(eddy[0], eddy[1])['weight'])

In [68]:
distributions_lin = get_edge_weight_distributions(y_rags_linear_weight)

# ignore edges
distributions_lin = distributions_lin[:40]

Full edge weight histograms down y-axis


In [29]:
count = 0
for distr in distributions_lin:
    plt.hist(distr, bins=150)
    plt.title("Layer " + str(count) + "Linear Edge Weight Histogram")
    plt.show()
    count+=1


Note the very different shape of distributions. It might be interesting to see what the total edge weight distribution looks like for the full volume (all layers as one). Then we could compare the distributions generated by the two weighting functions.


In [30]:
concat_distr = []
for distr in distributions_lin:
    concat_distr = concat_distr + distr

plt.hist(concat_distr, bins=150)
plt.title("Linear Edge Weight Histogram - All Layers")
plt.show()


Intersting. Now let's go back and compare that to the non-linear, unscaled weighting.


In [31]:
concat_distr2 = []
for distr in distributions:
    concat_distr2 = concat_distr2 + distr

plt.hist(concat_distr2, bins=150)
plt.title("Non-Linear Edge Weight Histogram - All Layers")
plt.show()


Very interesting. Why would that behavior arise from a non-linear function? Is this a property of the data or just the math? Either way, let's continue on with other statistic through the layers generated by the the linearly weighted RAG.

Linear Edge Weight Means


In [79]:
y_edge_linear_means = []
for distrib in distributions_lin:
    y_edge_linear_means.append(np.mean(distrib)) 

g = sns.barplot(x=range(len(y_edge_linear_means)), y=y_edge_linear_means, color='b')
g.set_xticks(g.get_xticks()[::2])
plt.xticks(g.get_xticks(), g.get_xticks())
plt.title("Linear, Scaled Edge Weight Means for Each Y Layer")
plt.xlabel("Y-Axis Layer")
plt.ylabel("Edge Weight Mean")
sns.plt.show()


Visually, we can see significant peaks at layers 4, 9, 13, 22, and 32. Let's compare that to the density sum plot. Perhaps that corroborates the evidence we saw earlier from the layer-by-layer density sums.


In [74]:
y_sum = [0] * len(vol[0,:,0])
for i in range(len(vol[0,:,0])):
    y_sum[i] = sum(sum(vol[:,i,:]))
sns.barplot(x=range(len(y_sum[:40])), y=y_sum[:40])


Out[74]:
<matplotlib.axes._subplots.AxesSubplot at 0x11abf98d0>

We see peaks at layers 9, 14, 21, 26, 31, and possibly 35. While the patterns in the edge weight means are not as consistent or noticable, the peaks are fairly close, especially in the first peaks. We could test this by looking at local minima and maxima analytically.


In [81]:
def local_minima(a):
    return argrelextrema(a, np.less)

In [82]:
density_sum_minima = local_minima(np.array(y_sum))
density_sum_minima


Out[82]:
(array([ 2,  6, 12, 18, 24, 30, 34]),)

Now let's go back and compare that with the minima on the edge weight means.


In [80]:
edge_mean_minima = local_minima(np.array(y_edge_linear_means))
edge_mean_minima


Out[80]:
(array([ 1,  6,  8, 11, 14, 18, 20, 23, 25, 28, 30, 33, 37]),)

There are quite a few local minima and they are spaced too close to each other. However, the feature that seemed interesting from the


In [ ]:
def local_mamima(a):
    return argrelextrema(a, np.greater)

Linear Edge Weight Variance


In [30]:
y_edge_linear_vars = []
for distrib in distributions_lin:
    y_edge_linear_vars.append(np.var(distrib)) 

sns.barplot(x=range(len(y_edge_linear_vars)), y=y_edge_linear_vars)
sns.plt.show()


Linear Edge Weight Skewness


In [31]:
y_edge_linear_skews = []
for distrib in distributions_lin:
    y_edge_linear_skews.append(skew(distrib)) 

sns.barplot(x=range(len(y_edge_linear_skews)), y=y_edge_linear_skews)
sns.plt.show()


Linear Edge Weight Kurtosis


In [32]:
y_edge_linear_kurts = []
for distrib in distributions_lin:
    y_edge_linear_kurts.append(kurtosis(distrib)) 

sns.barplot(x=range(len(y_edge_linear_kurts)), y=y_edge_linear_kurts)
sns.plt.show()


Number of Self Loops


In [ ]:
num_self_loops = []
for rag in y_rags:
    num_self_loops.append(rag.number_of_selfloops())

In [ ]:
num_self_loops

Interesting. There are no self loops. Why would this be? Let's come back to this. In the meantime, I want to give some though to what it means to have a self loop, whether it should be theoretically possible given our data, and whether our graphs are formed properly.

The answer to this question is very simple. In a RAG, there are no self-loops by definition. Self loops are edges that form a connection between a node and itself.

To see whether the graphs are formed properly, let's look at an adjacency lists: